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Abstract 

Bayes linear analysis and approximate Bayesian computation (ABC) are techniques 
commonly used in the Bayesian analysis of complex models. In this article we con- 
nect these ideas by demonstrating that regression-adjustment ABC algorithms pro- 
duce samples for which first and second order moment summaries approximate ad- 
justed expectation and variance for a Bayes linear analysis. This gives regression- 
adjustment methods a useful interpretation and role in exploratory analysis in high- 
dimensional problems. As a result, we propose a new method for combining high- 
dimensional, regression- adjustment ABC with lower-dimensional approaches (such as 
using MCMC for ABC). This method first obtains a rough estimate of the joint pos- 
terior via regression-adjustment ABC, and then estimates each univariate marginal 
posterior distribution separately in a lower-dimensional analysis. The marginal dis- 
tributions of the initial estimate are then modified to equal the separately estimated 
marginals, thereby providing an improved estimate of the joint posterior. We illustrate 
this method with several examples. 
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1 Introduction 

Bayes linear analysis and approximate Bayesian computation (ABC) are two tools that 
have been widely used for the approximate Bayesian analysis of complex models. Bayes 
linear analysis can be thought of either as an approximation to a conventional Bayesian 
analysis using linear estimators of parameters, or as a fundamental extension of the subjective 
Bayesian approach, where expectation rather than probability is a primitive quantity and 
only elicitation of first and second order moments of variables of interest is required (see e.g. 



Goldstein and Wooff 2007 for an introduction). In this article, we are interested in Bayes 
linear methods to approximate a conventional Bayesian analysis based on a probability 
model, and in particular in the setting where the likelihood is difficult to calculate. We write 
p(9) for the prior on a parameter 9 = (9\, 9 P ) T , p(y\9) for the likelihood and p(9\y) for the 
posterior. We discuss Bayes linear estimation further in the next section. 

Approximate Bayesian computation refers to a collection of methods which aim to draw 
samples from an approximate posterior distribution when the likelihood, p(y\9), is unavail- 
able or computationally intractable, but where it is feasible to quickly generate data from 



the model y* ~ p(y\9) (e.g. Lopes and Beaumont 2009 Bertorelle et al. 2010 Beaumont 



20l0l |Csillery et al. 2010| [S isson and Fan 20111). The true posterior is approximated by 



p(9\y) ~ p(9\s) where s = s(y) = (si, . . . , Sd) is a low-dimensional vector of summary 



statistics (e.g. Blum et al. 2011). Writing 

p(9,s*\s)<xK e (\\s-s*\\)p(s*\9)p(9), (1) 
where if e (||w||) = .K"(||tt||/e)/e is a standard smoothing kernel with scale parameter e > 0, 
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the approximate posterior itself is constructed as p(0\s) ~ f p{9, s*\s)ds*, following standard 
kernel density estimation arguments. The form of ([I]) allows sampler-based ABC algorithms 
(e.g. |Marjoram et al. 2003[ |Bortot et al. 2007| |Sisson et al. 2007[ |Toni et al. 2009 



Beaumont et al. 2009 Drovandi and Pettitt 2011) to sample from p(9,s*\s) without direct 



evaluation of the likelihood. 

Regression has been proposed as a way to improve upon the conditional density estimation 
of p{9\s) within the ABC framework. Based on a sample (9 1 , y 1 ), . . . , (9 n , y n ) from p{y\9)p(9) , 



and then transforming this to a sample 



? n ,s n ) from p{s\9)p{9) through s % 



s(y l ), Beaumont et al. (2002) considered the weighted linear regression model 



6 % = a + p 1 (s l - s) + e 



for i — 1, . . . , n, 



(2) 



where E{ are independent identically distributed errors, /3 is a d x p matrix of regression 
coefficients and a is a p x 1 vector. The weight for the pair (9 l , s 1 ) is given by X e (||s* — s\\). 
This regression model gives a conditional density estimate of p{9\s % ) for any s\ For the 
observed s, this density estimate is an estimate of the posterior of interest, p(9\s), and a + Ei 
is a sample from it. Writing least squares estimates of a and as a and j3, and the resulting 
empirical residuals as &j, then the regression- adjusted vector 



e i,a = i_ /3 Tf s i _ s \~ & + i . 



(3) 



is approximately a draw from p(9\s). Beaumont et al. (2002) do not consider the model ^ 
as holding globally, but instead consider a local-linear fit (this is expressed through specifying 
a kernel, K e , with finite support). Variations on this idea include extensions to generalised 



linear models (Leuenberger and Wegmann 2010) and non-linear, heteroscedastic regression 



based on a feed- forward neural network (Blum and Frangois 2010). The relative performance 



of the different regression adjustments are considered from a non-parametric perspective by 



Blum (2010) However, application of regression-adjustment methods can fail in practice if 



the adopted regression model is clearly wrong, such as adopting the linear model ^ for a 
mixture, or mixture of regressions model. 

The quality of the approximation p(6\y) ~ p(6\s) depends crucially on the form of the 
summary statistics, s. Equality p(9\y) = p(9\s) only occurs if s is sufficient for 9. However, 



reliably obtaining sufficient statistics for complex models is challenging (Blum et al. 2011), 
and so an obvious strategy is to increase the dimension of the summary vector, d = dim(s), 
to include as much information about y as possible. However, the quality of the second 
approximation, p(6\s) ~ J p(6, s*\s)ds* , is largely dependent on the matching of vectors of 
summary statistics within the kernel K e , which is itself dependent on the value of e. Through 



standard curse of dimensionality arguments (e.g. Blum 2010), for a given computational 
overhead (i.e. for a fixed value of e), the quality of the second approximation will deteriorate 
as d increases. As a result, given that more model parameters, 9, imply more summary 
statistics, s, this reality is a primary reason why ABC methods have not, to date, found 
application in moderate to high-dimensional analyses. 

In this article we make two primary contributions. First, we show there is an interesting 
connection between Bayes linear analysis and regression-adjustment ABC methods. In par- 



ticular, samples from the regression- adjustment ABC algorithm of Beaumont et al. (2002) 



result in first and second order moment summaries which directly approximate Bayes linear 
adjusted expectation and variance. This gives the linear regression- adjustment method a 
useful interpretation for exploratory analysis in high dimensional problems. 

Motivated by this connection, our second contribution is to propose a new method 
for combining high-dimensional, regression-adjustment ABC with lower-dimensional ap- 
proaches, such as MCMC. This method first obtains a rough estimate of the joint posterior, 
p(9\s), via regression- adjustment ABC, and then estimates each univariate marginal poste- 
rior distribution, p(9i\s), separately with a lower-dimensional ABC analysis. Estimation of 
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marginal distributions is substantially easier than estimation of the joint distribution be- 
cause of the lower dimensionality. The marginal distributions of the initial estimate are then 
modified to be those of the estimated univariate marginals, thereby providing an improved 
estimate of the joint posterior. Similar ideas have been explored in the density estimation 



literature (e.g. Spiegelman and Park 2003 Hall and Neumeyer 2006 Giordani et al. 2009). 



As a result, we are able to extend the applicability of ABC methods to problems of moderate 
to high dimensionality - comfortably beyond current ABC practice. 

This article is structured as follows: Section [2] introduces Bayes linear analysis, and 



explains its connection with the regression-adjustment ABC method of Beaumont et al. 



(2002) Section [3] describes our proposed marginal adjustment method for improving the 
estimate of the ABC joint posterior distribution obtained using regression-adjustment ABC. 
A simulation study and real data analyses are presented in Section [4j and Section [5] concludes 
with a discussion. 

2 A connection between Bayes linear analysis and ABC 
2.1 Bayes linear analysis 

As in Section 1, suppose that s = s(y) = (s\, ...,Sd) T is some vector of summary statistics 
based on data y, and that 9 = (9 1 , ...,9 P ) T denotes parameter unknowns that we wish to learn 



about. One view of Bayes linear analysis (e.g. Goldstein and Wooff 2007) is that it aims to 



construct an optimal linear estimator of 9 under squared error loss. That is, an estimator of 
the form a + Bs, for a p-dimensional vector, a, and a p x d matrix, B, minimising 

E[{9 - a - Bs) T {9 - a - Bs)). 
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The optimal linear estimator is given by 



E s (9) = E{9) + Cov (0, s)Var(s)^ 1 [s - E{s)}, (4) 

where expectations and covariances on the right hand side are with respect to the joint prior 
distribution of s and 9 i.e. p(s\9)p(9). The estimator, E s (9), is referred to as the adjusted 
expectation of 9 given s. If the posterior mean is a linear function of s then the adjusted 
expectation and posterior mean coincide. Note that obtaining the best linear estimator 
of 9 does not require specification of a full prior or likelihood - only first and second order 
moments of (9, s) are needed. From a subjective Bayesian perspective, the need to make only 
a limited number of judgements concerning prior moments is a key advantage of the Bayes 



linear approach. There are various interpretations of Bayes linear methods - see Goldstein 



and Wooff (2007) , for further discussion. In the ABC context, a full probability model 
is typically available. As such, we will consider Bayes linear analysis from a conventional 
Bayesian point of view as a computational approximation to a full Bayesian analysis. 

The adjusted variance of 9 given s, Vai s (9) = E[(9 — E S (9)) J \9 — E s (9)], can be shown 
to be 

Var s (#) = Var(fl) - Cov(fl, s)Var(s)- 1 Cov(s, 9). 

Furthermore, the inequality Var s (0) > £'[Var(^|s)] holds, where A > C means that A — C 
is non-negative definite, and the outer expectation on the right hand side is with respect 
to the prior distribution for s, p(s). This inequality indicates that Var s (6') is a generally 
conservative upper bound on posterior variance, although it should be noted that Var s (6*) 
does not depend on s, whereas Var(6'|s) is fully conditional on the observed s. If the posterior 
mean is a linear function of s, then Var s (#) = £'[Var(0|s)] 
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2.2 Bayes linear analysis and regression adjustment ABC 



It is relatively straightforward to link the regression-adjustment approach of Beaumont et al. 



(2002) with a Bayes linear analysis. However, note that Beaumont et al. (2002) do not 
consider the model ^ as holding globally, but instead assume that it holds locally around 
the observed summary statistics, s. We discuss this point further below, but for the moment 
we assume that the unweighted linear model Q holds globally, after an appropriate choice 
of the summary statistics, s. 

The ordinary least squares estimate of f3 under the linear model (j2p is j3 = £(s) -1 £(s, 9), 
where S(s) is the sample covariance of s 1 , ...,s n and S(s, 9) is the sample cross covariance 
of the pairs (s\9 l ), i = l,...,n. For large n (where n is a quantity under direct user control 
in an ABC analysis), $ is approximately /3 = Var(s) _1 Cov(s, 9), where Var(s) and Cov(s,#) 
are the corresponding population versions of E(s) and S(s, 9). As such, for large n, the mean 
of one of the samples 9 l ' a (c.f. p| is approximately 



E(9 i ' a ) w E[9 l -/3 T (s i - s)} 

= E{9) + Cov(^, s)Ybi(s)- 1 [s - E(s)} 
= E s {9). 

That is, the mean of the regression-adjusted 9 l ' a is the Bayes linear adjusted expectation of 
9 given s, if n is large. Similarly, and also for large n, we have 

Var(^' a ) rj Var[^-/3 T (s l -s)] 

= Var(#) + /3 T Var(s)/3 - 2Cov(9, s)/3 

= Var(#) - Cov(9, s)Var(s)" 1 Cov(s, 9) 

= Var s (9). 
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Hence, the covariance matrix of the regression-adjusted 9 l,a approximates the Bayes linear 
adjusted variance for large n. 

These results demonstrate that the first and second moments of the regression- adjusted 



samples 9 l ' a , i = l,...,n in the linear method of Beaumont et al. (2002) have a useful 
interpretation, regardless of whether the linear assumptions of the regression model (J2j) hold 
globally, as a Monte Carlo approximation to a Bayes linear analysis. This connection with 
Bayes linear analysis is not surprising when one considers that a Monte Carlo approximation 
to Q based on draws from the prior is just a least squares criterion for regression of 9 
on s. Usefully for our present purposes, the Bayes linear interpretation may be helpful 
for motivating an exploratory use of regression adjustment ABC, even in problems of high 
dimension. The connection between Bayes linear methods and regression- adjustment ABC 
continues to hold if kernel weighting is reincorporated into the regression model (j2|. Now 
consider the model ([I]) in general and a Bayes linear analysis using first and second order 
moments of (9, s*)\s with Bayes linear updating by the information s = s*. This then 



corresponds to the kernel weighted version of the procedure of Beaumont et al. (2002) 

A recent extension of regression-adjustment ABC is the nonlinear, heteroscedastic method 
of Blum and Frangois (2010"j| which replaces ^ with 



9 l = //(<) + <7(sV , (5) 



where fi(s l ) = E(9\s = s l ) is a mean function, <j(s 1 ) is a diagonal matrix with diagonal 
entries equal to the square roots of the diagonal entries of Var(0|s = s l ), and the e % are i.i.d. 
zero mean random vectors with Var(e l ) = /. It is possible also to take a{s) to be some 
matrix square root of Var(#|s = s l ) where all elements are functions of s. If ^ holds, then 
the adjustment 

0*.« = + a{s)a{s i Y X [0 i - //(s*)] 
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is a draw from p(9\s). The heteroscedastic adjustment approach does seem to be outside the 
Bayes linear framework. However, a nonlinear mean model for fj,(s) with a constant model 
for cr(s) can be reconciled with the Bayes linear approach by considering an appropriate ba- 



sis expansion involving functions of s. Blum (2010) gives some theoretical support for more 
complex regression adjustments through an analysis of a certain quadratic regression adjust- 
ment and suggests that transformations of 9 can be used to deal with heteroscedasticity. In 
this case, the Bayes linear interpretation would be more broadly applicable in regression- 
adjustment ABC. An interesting recent related development is the semi-automatic method 
of choosing summary statistics of Fernhead and Prangle (2012)| They consider an initial 
provisional set of statistics and then use linear regression to construct a summary statistic 
for each parameter, based on samples from the prior or some truncated version of it. Their 
approach can be seen as a use of Bayes linear estimates as summary statistics for an ABC 
analysis. There are several other innovative aspects of their paper but their approach to 
summary statistic construction provides another strong link with Bayes linear ideas. 



3 A marginal adjustment strategy 

Conventional sampler-based ABC methods, such as MCMC and SMC, which use rejection- 
or importance weight-based strategies, are hard to apply in problems of moderate or high 
dimension. This occurs as an increase in the dimension of the parameter, 9, forces an increase 
in the dimension of the summary statistic, s. This, in turn, causes performance problems 
for sampler-based ABC methods as the term lf e (||s — s*\\) in suffers from the curse of 



dimensionality (Blum 2010). On the other hand, regression-adjustment strategies, which can 
often be interpreted as Bayes linear adjustments (see Section [2]), can be useful in problems 
with many parameters. However, it is difficult to validate their accuracy, and sampler-based 
ABC methods may be preferable in low dimensional problems, particularly when simulation 
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under the model is computationally inexpensive. 

We now suggest a new approach to combining the low-dimensional accuracy of sampler- 
based ABC methods, with the utility of the higher- dimensional, regression- adjustment ap- 
proach. In essence, the idea is to construct a first rough estimate of the approximate posterior 
using regression-adjustment ABC, and also separate estimates of each of the marginal distri- 
butions of 8i\s, . . . , 9 p \s. Estimating marginal distributions is easier than the full posterior, 
because of the reduced dimensionality of summary statistics required to be informative about 
a single parameter. Because of the lower dimensionality, each marginal density can often 
be more precisely estimated by any sampler- or regression-based ABC method, than the 
same margin of the regression-based estimate of the joint distribution. We then adjust the 
marginal distributions of the rough posterior estimate to be those of the separately estimated 
marginals, by an appropriate replacement of order statistics. The adjustment of the marginal 
distributions maintains the multivariate dependence structure in the original sample. When 
the marginals are well estimated, it is reasonable to expect that the joint posterior is better 
estimated. 

Precisely, the procedure we use is as follows: 

1. Generate a sample (9\s l ), i = 1, ...,n from p(6)p(s\9). 

2. Obtain a regression adjusted sample 9 t,a , i — 1, ...,n based on either the model ([2j or 
(|5| fitted to the sample generated at step 1. The regression adjusted methods may be 
implemented with or without kernel weighting. 

3. For j = 1, 

(a) For the marginal model for 9j, 




10 



where Q_j is 9 with the element 9j excluded, identify summary statistics s(j) = 
(s(j)i, s(j) c i(j)) T that are marginally informative for Oj. 

(b) Use a conventional ABC method to estimate the posterior distribution for 9\s(j). 
Extracting the j th component results in a sample, 9™ 1 ' 1 , i = 1, n. If the number 
of samples drawn is not n, then we obtain a density estimate based on the samples 
we have and then define 9J 1 ' 1 , i — 1, ...,n to be n equally spaced quantiles from 
the density estimate. 

(c) Replace the i — 1, . . . , n order statistics for the j th component of the sample 9 a,t , 
by the equivalent quantiles of the marginal samples 9 

More precisely, writing 9™ and 9^ as the k th order statistic of the samples 9™' 1 
and 9j' 1 , respectively, i = 1, ...,n then we replace 9j' with 9j^ for i = 1, . . . , n. 
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The samples, 9 a,t , with all p margins adjusted are then taken as an approximate sample 
from the ABC posterior distribution. The idea of incorporating knowledge of marginal 
distributions into estimation of a joint distribution has been previously explored in the 



density estimation literature. Spiegelman and Park (2003) consider parametrically estimated 



marginal distributions and then replacing order statistics in the data by the quantiles of the 
parametrically estimated marginals. This is similar in spirit to our adjustment procedure in 
the ABC context. They show by theoretical arguments and examples that improvements can 



be obtained if the parametric assumptions are correct. Hall and Neumeyer (2006) consider 
density estimation when there is a dataset of the joint distribution as well as additional 
datasets for the marginal distributions. They consider a copula approach to estimation of 
the joint density and show that the additional marginal information is beneficial if the copula 
is sufficiently smooth. Recently, Giordani et al. (2009) | have considered a mixture of normals 



copula approach where the marginals are also estimated as mixtures of normals. 

A powerful motivation for using available marginal information comes from the fact that 
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a joint distribution is determined by the univariate distributions of all its linear projections. 
This arises as the characteristic function of the joint distribution is determined from the 
characteristic functions of one dimensional projections. Hence adjusting the distribution of 
all linear projections of a density estimate to be correct would result in the true distribution 
being obtained. By adjusting marginal distributions we only consider a selected small number 
of linear projections. However, we expect that if good estimates of marginal distributions 
are available, then transforming a rough estimate of the joint density to take these marginal 
distributions will be beneficial. 

Note that estimation and adjustment of the p marginal distributions in Step 3 may be 
performed in parallel, so that computation scales well with the dimension of 6. Because 
the Bayes linear adjusted variance, Var s (0), is generally a conservative upper bound on the 



posterior variance (see Section 2.1), it is credible that the initial rough samples 6 l ' a could 
form the basis of initial sampling distributions for importance-type ABC algorithms (e.g. 



Sisson et al. 2007 Beaumont et al. 2009 Drovandi and Pettitt 2011), resulting in potential 



computational savings. Finally we note that a number of methods exist to quickly determine 



the appropriate statistics, s(j), for each marginal analysis. The reader is referred to Blum 



et al. (2011) for a comparative review of these. 



4 Examples 

4.1 A Simulated Example 

We first construct a toy example where the likelihood can be evaluated and where a gold 
standard answer is available for comparison. While ABC methods are not needed for the 
analysis of this model, it is instructive for understanding the properties of our methods. 
We consider a p-dimensional Gaussian mixture model with 2 P mixture components. The 
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likelihood for this model is given by 



1 i 



b 1= b p =0 li=l 



n^(i 



cu) bi 



<i> p (s\n(b,e),x) 



where <p p (x\a, B) denotes the p-dimensional Gaussian density function with mean a and 
covariance B evaluated at x, u G [0, 1] is a mixture weight, //(&, 9) T = ((1 — 2b±)9i, . . . , (1 — 
2b p )9p), b = (bi, . . . ,b p ) with 6j e {0, 1} and £ = [E^] is such that E„ = 1 and E^ = p for 
i ^ j. Under this setting, the marginal distribution for Sj is given by the two-component 
mixture 

= (1 - - B u 1) + cW>i(*, 1). (6) 

The combination of the p two-mixture- component marginal distributions forms the TP mix- 
ture components for the p-dimensional model. Given 9, data generation under this model 
proceeds by independently generating each component of b to be or 1 with probabilities u 
and 1 — co respectively, and then drawing s\9, b ~ 4> p (fi(b, 9), E). 

For the following analysis we specify s = (5,5, ...,5), u = 0.3 and p = 0.7, and re- 
strict the posterior to have finite support on [—20, 40] p , over which we have a uniform prior 
for 9. Computations are performed using 1 million simulations from p{s\9)p(9), using a 
uniform kernel if £ (||tt||), where || ■ || denotes Euclidean distance, and where e is chosen to 
select the 10,000 simulations closest to s. We contrast results obtained using standard rejec- 



tion sampling, rejection sampling followed by the regression- adjustment of Beaumont et al. 



(2002) , and both of these after applying our marginal- adjustment strategy. All inferences 



were performed using the R package abc (Csillery et al. 2011) 



Figure []Ja) illustrates the relationship between 9 1 and si (all margins are identical), with 
around 70% of summary statistics located in the line with negative slope. The observed 



13 





aoueBjaAiQ ~\y\ 




~i 1 1 1 1 r 

OS'O CH'O OO'O 




o 







SV 01 9 9- 01- 



aj o 

g 
o 

'Eh 

o 



fH 

o 

'sh 

a 

o 



o 
. PI 



aj 



PI 

o 
o 



o 



X3 

g 



X 

o 

o 
-d 



^ p2 

Pi =+h -5 

o 

-I " 

aj CT 1 pH 
±i ^ 

3 ^ 

w SS 2 

CU fe -d 

Ptf o ^ 
nd 



.2 



o 
o 



o 
o 



o 

Pi 

aj 

HH 



o 
fH 

g 

h cu 

-g -t^ 

ft, aj 

PT Sh 



Pi 

2 ^ 

x bO 

rj CO 0) 

CU faO fn 

£ £ CD 



CP 



Hd s 

3 nd 

.2 ™ 

+^ , — ^ 

° 

'qj ' 

Sh 

CU Pi 

^ aj 

+= , v 

CO r-Q 

•+J >■ ' 

£ x 

^ Pi 

aj aj 
CU 

Pi « 

aj I, 

PL, 11 

P-! 03 

CU 

cj 

2 "+ 3 

3 .J 

CU aj 

SH +3 



>, 

aj 



3 3 
2 § 

p> -a 

aj CU 
> 
fn 

2 co 

o 



s r 

ffl » s 

2 5 03 

li ^ s 

§ J a 

'3 ■ 1 c 

to T3 " 

P cu ^ 

^2 g 

-d $ ® 

> ^ 2 

p= g 

Jp< CD 

— d "3 

.3 3 

CO 

03 ?h O 

g -2 & 



aj 
x 

pj 

.2 

o 
o 

'5? 

a 
+j 
o 
pj 
o 
T3 



x 

aj 

^aj 
Pi 

'5b 

^ d 

cu n 
> 



o 

aj 
O 

.3 

X 
M 

3 

o 

Pi 
o 

O 

CO 



.3 3 

a '5b 

I § 



o 
o 

X 

w 

x 
o 

Pi 



Pi 
.2 

-(-= 
o 
cu 

" 'a? 



Pi ^ .-s 



pi 
o 



p! 
o 



o 



3 "3 

oj 



<3S 



CU 



.2 3-> 



X CO 

cm 



O 

£ - CU 



fH .O 

aj 



a 



3 cu ^ 

a p 5 2 

|£ S 

S . 2 

d ^ s 



cu x 

+3 PI 2 

o o ^d 

GU -S 



— 1 aj 

.2 

5b .3 

r 'd co 
d p 

aj d 

2 ^ 
m ^ 

. aj 

a, Q 
.2 ^ 

X cu 

d g 

CU B 



o 
'Ci 
o 
+= 
x 
O 
& 

o 



oj -H 

X ^ 

d ^ 
o ^ 

o t3 
_cu d 
57 oj 
fn ^ s 

o n> 
* — ' 



08 

G "cu 

oj > 



> 
o 



o 
o 

cu Tu 



d 

oj 

o 



5 



g t3 



3 3 



d 

o 



d 

o 

X 

o 



o 
o 



oj 

d 

o 



4J x 

oj cu 



B "3 p 



X o -d 



g cu 

° nd 

cu 

2 o 
d +s 

CU TU 

^ g 
i — i » 

Ph 

X 

c 

C 

o 



x 
d 

cu 

bC 3 
d -p! 
•3 ^ 

> 3 

o 
o 



o 
d 



s go 



X 

X 

o 
bJ3 

•2 S 
g ho 

oj g 
x O 

Oj 

- J3 

CU 

S g 
cu g 



T3 
d 

oj 



oj <d X 



oj ^j 



^1 

pq 

<: 

cu 

CU E pi 
nd CU oj 

^ ^ Ph 

0) aj 

P^ ^ 

a fn p 

.2 S 

fH 

cu x 
+3 d 

X .P-j 

o -a 

Ph aj 



x 

d 



14 



summary statistic is indicated by the horizontal line, the marginal posterior distribution 
for 9i defined by the implied density of summary statistics on this line. Figure [jjb) shows 
density estimates of 9i\s using rejection sampling for p = 1, 2, . . . , 5 model dimensions. The 
univariate true marginal distribution is indicated by the dashed line. As model dimension 
increases, the quality of the approximation to the true marginal distribution deteriorates 



rapidly. This is due to the curse of dimensionality in ABC (e.g. Blum 2010) in which 
the restrictions on si for a fixed number of accepted samples (in this case 10,000) decrease 
within the comparison \\s — s*|| as p increases. Beyond p = 2 dimensions, these density 
estimates are exceptionally poor. The same information is illustrated in Figure [l|c) after 
applying the linear regression-adjustment of Beaumont et al. (2002) | to the samples obtained 
by rejection sampling in Figure [lib). Clearly the regression-adjustment is beneficial in 
providing improved marginal density estimates. However, the quality of the approximation 
still deteriorates quickly as p increases, albeit more slowly than for rejection sampling alone. 

Figures [l|e) and (f) show the two dimensional density estimates of (9i, 6 , 2 ) | s for the p = 3 
dimensional model, respectively using rejection sampling, and rejection sampling followed 
by the linear regression-adjustment. The superimposed contours correspond to those of the 
true bivariate marginal distribution. The improvement to the density estimate following 
the regression-adjustment is clear, however even here, the component modes appear to be 
slightly misplaced, and there is some blurring of density with neighbouring components. 

Figures [l](g) and (h) correspond to the densities in Figures [Ije) and (f) after the im- 
plementation of our marginal adjustment strategy. Here, each margin of the distributions 
is adjusted to be that of the appropriate univariate marginal density estimate in a p = 1 
dimensional analysis. E.g. the margins for 9\ are adjusted to be exactly the (p = 1) density 
estimates in Figures [l](b) and (c). In both plots (g and h) there is a clear improvement in the 
bivariate density estimate: the locations of the mixture components are in the correct places, 
and on the correct scales. Some of the accuracy of the dependence structure is less well cap- 
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tured under just rejection sampling, however (Figure pig))- Here, the correlation structure of 
each Gaussian component seems to be poorly estimated, compared to that obtained under 
the regression-adjustment transformed samples. The message here is clear: any marginal 
adjustment strategy cannot recover from a poorly estimated dependence structure. The re- 
gression adjusted density in Figure [l](f) more accurately captures the correlation structure of 
the true density, and this improved dependence structure is carried over to the final density 
estimate in Figure []Jh). 



Finally, Figure |T](d) examines the quality of the ABC approximation to the true density, 

p(6\s). Plotted is the Kullback-Leibler divergence, J p(6) log [p(6) / q(6)] d8, between the den- 
sities of the first two dimensions of each distribution as a function of model dimension (where 
p(9) = p(9i,02\s) is the true density, and q{6) = q(9i, 6* 2 |s) is a kernel density estimate of the 
ABC approximation). The divergence is computed by Monte Carlo integration using 2,000 
draws from the true density. We compare only the first two dimensions of the p-dimensional 
posteriors to maintain computational accuracy, noting that all pairwise marginal distribu- 
tions of the full posterior are identical in this analysis (similarly for all higher-dimensional 
marginals) . 



Figure [T[d) largely supports our previous conclusions. The performance of the rejection 

sampler and the rejection sampler with regression-adjustment deteriorates rapidly as the 
number of model dimensions (i.e. summary statistics) increases, although the latter performs 
better in this regard. There is a clear improvement to both of these approaches gained 
though our marginal adjustment strategy, with the modified regression- adjustment samples 
performing marginally better (for this example) where the original regression- adjustment 
provides better estimates of the multivariate dependence structure in lower dimensions. 

After around p = 5 dimensions there is little difference between the two marginally ad- 
justed posteriors, and the divergence levels off to a constant value independent of model 
dimension. This is result of the ABC setup for this analysis. Beyond around p = 5 di- 






16 




Figure 2: The heather incidence data representing a 10 x 20 metre region (Diggle, 1991). 

mensions, there is little difference between the rejection sampling and regression- adjusted 
posteriors (e.g. Figures [l|e) and (f)), both largely representing near-uniform distributions 
over 9. Hence, our marginal adjustment strategy is only able to make the same degree of 
improvements, regardless of model dimension. The correlation dependence structure is also 
lost beyond this point, so the expected benefit of the regression-adjustment prior to marginal 
regression adjustment, is nullified. Using a lower initial threshold, e (computation permit- 
ting), would allow a more accurate initial ABC analysis, and hence more discrimination 
between the rejection sampling and regression-adjustment approaches. 



4.2 Excursion set model for heather incidence data 



We now consider the medium resolution version of the heather incidence data analysed by 



Diggle (1981), which is available in the R package spatstat (Baddeley and Turner 2005). 
Figure [2] illustrates the data, consisting of a 256 x 512 grid of zeros and ones, with each 
binary variable representing presence (1) or absence (0) of heather at a particular spatial 



location. Nott and Ryden (1999) used excursion sets of Gaussian random fields to model a 
low resolution version of these data. Without loss of generality, we assume that the data are 
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observed on an integer lattice. 

Let {Y(t); t G IR 2 } be a stationary Gaussian random field with mean zero and covariance 
function 



R{s,t) = Cov(Y(s),Y(t)) =exp[-(t- s) T A(t- s)] 

where s,t G M 2 and where A is a symmetric positive definite matrix. Hence R(s, t) cor- 
responds to the Gaussian covariance function model with elliptical anisotropy. The w-level 
excursion set of Y(i) is defined as E U (Y) = {teR 2 : Y(t) > u}, so that E U (Y) is obtained 
by "thresholding" Y(t) at level u G R. For background on Gaussian random fields and 



geometric properties of their excursion sets see e.g. Adler and Taylor (2007) 

We model the heather data as binary random variables which are indicators for inclu- 
sion in an excursion set on an integer lattice. The data are denoted B = {B(i,j) : i = 
0, ...,255, j = 0, ...,511} where B(i,j) = G E U (Y)) and where /(•) denotes the in- 

dicator function. The distribution of B clearly depends on u and A. We write the (i,j)th 
element of A as Aij. Since A is symmetric, A 12 = A 2 \. We parametrize the distribu- 
tion of B through 9 = {6\, Qi, O3, 6*4) where 61 = u, 9 2 = log An, 9 3 = logA 22 and 6*4 = 
logit[(Ai 2 / y/ AuA 2 2 + l)/2]. We adopt the independent prior distributions 8 1 ~ iV(0,0.5 2 ), 
9 2 ,9 3 ~ iV(— 4,0.5 2 ) and 6*4 ~ A^(0,0.5 2 ). Simulation of Gaussian random fields is achieved 



with the RandomFields package in R ( |Schlather 2011 ), using the circulant embedding algo- 



rithm of Dietrich and Newsam (1993) and Wood and Chan (1994)| 



For summary statistics, denote by nu(v) for v G M 2 the number of pairs of variables in 
B, separated by displacement v, which are both 1. Similarly denote by noo(^) the number 
of such pairs which are both zero, and by n m (v ) the number of pairs where precisely one of 
the pair is 1 (the order does not matter). In terms of estimating each marginal distribution 
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#i|s(l), . . . , 6*415(4), we specify 

s(l) = ^S(i,j)/(256 x 512) 

s(2) = [nn(fi),noo(wi),n i(wi)] T 
s(3) = [nii(f 2 ),n o(w2),^oi(^2)] T 

s(4) = [nn(u 3 ), nii(u 4 ), ^00(^3), ^00(^4), ^01(^3), «oi (^)] T 

as the summary statistics for each parameter, where v\ = (0, 1), v% — (1, 0), V3 — (1, 1) and 
v 4 = (1,-1). 

For the joint posterior regression-adjustment, we used the heteroscedastic, non-linear 
regression ([5j (Blum and Frangois 2010), using the uniform kernel, K e (\\ ■ ||), with scale 



parameter set to give non-zero weight to all 2,000 samples (#\ s l ) ~ p(s\6)p{9), and where 
|| ■ || represents scaled Euclidean distance. The individual marginal distributions were esti- 
mated in the same manner, but with the kernel scale parameter specified to select the 1,000 
simulations closest to each s(j). All analyses were again performed using the R package abc 



(Csillery et al. 2011) with the default settings for the heteroscedastic nonlinear method. 

Figure [3] shows estimated marginal posterior distributions for the components of 9 ob- 
tained by the joint regression-adjustment (dotted lines), and the same margins following 
our marginal adjustment strategy (solid lines). The estimates for the spatial dependence 
parameters 82 and 83 are poor for the joint regression approach - the individually estimated 
marginals are estimated very accurately, which can be verified by a rejection based analy- 
sis with a much larger number of samples (results not shown). Clearly if we use samples 
from the approximate joint posterior distribution from the global regression for predictive 
inference or other purposes, the fact that the unadjusted marginals are centred in the wrong 
place can lead to unacceptable performance of the approximation. 

It is interesting to understand why the global regression approach fails here. Some insight 

19 




Figure 3: Estimated marginal posterior distributions of 6\, . . . , 64 for the heather incidence 
analysis. Solid lines denote individually estimated marginals; dotted lines illustrate estimated 
margins from the joint posterior analysis 
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Figure 4: Plot of 82 versus n i(t>i) (left) and 63 versus n i(t>2) (right) for samples from the 
prior for the heather incidence data. 

can be gained from Figure |1J which illustrates (prior predictive) scatter plots of 82 versus 
noi(vi) and 83 versus ^01(^2)- The summary statistics noi(fi) and ^01(^2) are those which 
are most informative about 8 2 and 8 3 respectively. If we consider regression of each of 
these parameters on the summary statistics, the graphs show that not only the mean and 
variance, but also higher order properties, such as skewness of the response, appear to change 
as a function of the summary statistics. As such, the heteroscedastic regression- adjustment 
based on flexible estimation of the mean and variance does not work well here. Making the 
regression local for each marginal helps to overcome this problem. 

4.3 Analysis of an AWBM computer model 

We now examine methods for the analysis of computer models, where we aim to account for 
uncertainty in high-dimensional forcing functions, assessment of model discrepancy and data 
rounding. An approximate treatment of this problem is interesting from a model assessment 
point of view, where we want to judge whether the deficiencies of a computer model are such 
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that the model may be unfit for some purpose. 

A computer model can be regarded as a function y = f(rf) where 77 are model inputs and 
y is a vector of outputs. In modelling some particular physical system, observed data, d, 
is typically available that corresponds to some subset of the model outputs, y. The model 
inputs, 77, can be of different types. Here we only make the distinction between model pa- 
rameters, 6*, and forcing function inputs, u, so that 77 = (u,0*). Commonly, measurements 
of the forcing function inputs are available, and uncertainty in these inputs (due to e.g. sam- 
pling and measurement errors) will be ignored in any analysis due to the high-dimensionality 
involved. An uncertainty analysis (involving an order of magnitude assessment of output 
uncertainty due to forcing function uncertainty) will often be performed, rather than at- 
tempting to include forcing function uncertainty directly in a calibration exercise (see, for 



example, Goldstein et al. 2010| for an example of this in the context of a hydrological model). 



See e.g. |Craig et al. (1997)[ |Goldstein and Rougier (2009) \ |Kennedy and O'Hagan (2001) 



and Goldstein et al. (2010) for further discussion of different aspects of computer models 

We now assume that y = f(rj) corresponds to a prediction of the observed data d in the 
model 

d = f(v)+9 + e, (7) 
where e denotes measurement error and other sources of error independent in time, and 



g is a correlated error term representing external model discrepancy (see Goldstein et al. 



2010 for a discussion of the differences between internal and external model discrepancies). 
We directly investigate forcing function uncertainty, through the term f(i]), using ABC. In 
the analysis of the model (17]), we also consider data rounding effects, so that simulations 
produced from ^ are rounded according to the precision of the data that was collected. 
Handling such rounding effects is very simple in the ABC framework. 
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Figure 5: The three-catchment Australian water balance model. 



As a computer model, we consider the Australian Water Balance Model (AWBM) ( |Bought jn 



2004), a rainfall-runoff model widely used in Australia for applications such as estimating 



catchment water yield or designing flood management systems. As shown in Figure |5j the 
model consists of three surface stores, with depths cx, 02,03 and fractional areas 01,02,03 
with Y2k a k = I; an d a base store. Model forcing inputs are precipitation and evapotran- 
spiration time series, from which a predicted streamflow is produced. At each time step in 
the model, precipitation is added to the system and evapotranspiration subtracted, with the 
net input split between the surface stores in proportion to the fractional areas. Any excess 
above the surface store depths is then split between surface runoff and flow into the base 
store according to the baseflow index < BFI < 1. Water from the base store is discharged 
into the stream at a rate determined by the recession constant < K < 1, and the total 
discharge (streamflow) is then determined as the sum of the surface runoff and the baseflow. 
Following Bates and Campbell (2001)| we fix BFI = 0.4, although in some applications 
it may be beneficial to allow this parameter to vary. The model parameters are therefore 
6* = (ci, 02,03, ax, 02, K), as well as the high- dimensional evapotranspiration and precipita- 
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tion forcing inputs, u. In hydrological applications there is often great uncertainty about the 
precipitation inputs in particular, due to measurement and sampling errors. Here we assume 
that evapotranspiration is fixed (known), and we use w & s to denote the series of observed 
precipitation values only. In running the computer model, we initialize with all stores empty 
and discard the first 500 days of the simulation to discount the effect of the assumed initial 
conditions. Our data consist of a sequence of 5500 consecutive daily streamflow values from 
a station at Black River at Bruce Highway in Queensland, Australia. The catchment covers 
an area of 260km 2 with a mean annual rainfall of 1195mm. 

To complete the determination of the computer model ([7]), we specify the model priors. 
Writing 77 = (u, 9*), we describe the uncertainty on the true forcing inputs, u = (ux, . . . , ut), 
as u t = 8tU bs,t, where the random multiplicative terms have prior log5 t ~ iV(— crf/2, ex 2 ) for 
t = 1, ...,T. We set as ~ [7(0, 0.1), and note that E(5 t ) = 1 a priori. For the external model 



discrepancy parameters, g = (gi, . . . , gr) T , we specify g ~ N(0, S g ) where S g = [S 



9M\ 



IS 



such that Yigjj = <r 2 exp(— p\i — j\) , a g ~ [7(0,0.1) and p ~ [7(0.1,1). Independent model 
errors, e = (ei, . . . ,er) T , are assumed to be e t ~ A^(0,cx 2 ), for t — 1, ...,T where a e ~ C7[0, 2]. 
AWBM parameter priors are specified as c\ ~ [7(0,500), C2 — ci ~ [7(0,1000), C3 — C2 ~ 
[7(0,1000), [log(a 1 /a 3 ),log(a 2 /a 3 )] T ~ A^(0,0.5 2 /) and K ~ 0.271 x Beta(5.19, 4.17) + (1 - 



0.271) x Beta(255, 9.6), where the latter is a mixture of beta distributions. See Bates and 



Campbell (2001)] for discussion of the background knowledge leading to this prior choice. 



If we treat the forcing inputs, u, as nuisance parameters, our parameter of interest is 
= (#* T ,7 T ) T , the set of AWBM model parameters, and 7 = (a e , a g , cr q , p) T , those param- 
eters specifying distributions of the stochastic terms in ([7]). The ABC approach provides a 
convenient way of integrating out the high- dimensional nuisance parameter, u, while dealing 
with complications such as rounding in the recorded data (the streamflow data are rounded 
to the nearest 0.01mm). This would be very challenging using conventional Bayesian com- 
putational approaches. 
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To define summary statistics, denote 9* as the posterior mode estimate of 9* in a model 
where we assume no input uncertainty, u = u> b s , and where we log-transform both the 
data and model output. Also denote by s 7 = [ip(0), if)(l), ^(2), C(l)? C(2), C(3)] T , where 
is the lag j autocovariance of the least squares residuals d — f(fj), and is the lag j 
autocovariance of (d — f{fj)) 2 with fj = (u b s , 9*). In the notation of Section [3J for summary 
statistics for 9j, j = 1, ...,6 (i.e. the components of 9*; the AWBM parameters) we use the 
statistic s(j) = 9* and for 9j, j = 7, ...,10 (i.e. the components of 7) we use the statistic 
s(j) = s 7 . In effect, the summary statistics for 9* consist of point estimates for the AWBM 
parameters under the assumption of no error in the forcing inputs, u, and statistics for the 
model error parameters, 7, are intuitively based on autocovariances of residuals and squared 
residuals. Optimisation of 9* is not trivial, as the objective function may have multiple 
modes. To provide some degree of robustness, we select the best of ten Nelder-Mead simplex 



optimisations (Nelder and Mead 1965) using starting values simulated from the prior. 

Estimated marginal posterior distributions for the parameters are shown in Figure [6] 
For the joint-posterior analysis, we implemented the non-linear, heteroscedastic, regression- 



adjustment of Blum and Frangois (2010) using the uniform kernel, K e (\\ ■ ||), with scale 
parameter set to give non-zero weight to all 2,000 samples (9' 1 , s l ) ~ p(s\9)p(9) . For the indi- 
vidually estimated margins, the scale parameter was specified to select the 500 simulations 
closet to each s(j). The discrepancy between estimates for the parameters Ci, K and a e 
is particularly striking. To understand why the joint posterior regression-adjustment fails, 
Figure [7] shows prior predictive scatterplots of these parameters, each against their most 
informative summary statistic. Similar to the heather incidence example, the distribution 
of the response evidently changes as a function of the covariates in more complicated ways 
than just through the first two moments. This is the root cause of the difficulties with the 
joint regression- adjustment approach. Clearly, the fact that the unadjusted marginals are 
centred in the wrong place is unacceptable for inferential purposes. 
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Marginal Posterior Densities 




Figure 6: Estimated marginal posterior distributions for the AWBM computer model. Solid 
lines and dotted lines represent the seperately and jointly estimated marginals respectively 




Figure 7: Plots of transformed parameters c±, K and cr e against transformed summary 
statistics for samples from the prior distribution for the AWBM example. The solid vertical 
lines indicate the observed values for the summary statistics. 
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It is still difficult to validate the accuracy of the marginally- adjusted posterior, even 
though it is clearly more reasonable than without the adjustment. A tentative conclusion 
from the above analysis is that input uncertainty (through the multiplicative perturbation 
on the precipitation inputs, uj, controlled through the term cr$) may explain more of the 
model misfit than the external model discrepancy term (g). As such, the AWBM may be an 
acceptable model for the data given the inherent uncertainty in the forcing inputs. 

5 Discussion 

In problems of moderate or high dimension, conventional sampler-based ABC methods 
which use rejection or importance-weight mechanisms, are of limited use. As an alterna- 
tive, regression-adjustment methods can be useful in such situations, however their accuracy 
as approximations to Bayesian inference may be difficult to validate. 

In this article we have suggested that many regression-adjustment models are usefully 
viewed as Bayes linear approximations, which lends support to their utility in high dimen- 
sional ABC. We have also demonstrated that it is possible to efficiently combine regression- 
adjustment methods with any ABC method (even sampler-based ones) that can estimate 
a univariate marginal posterior distribution, in order to improve the quality of the ABC 
posterior approximation in higher dimensional problems. 

Our marginal- adjustment strategy allows the routine application of standard ABC meth- 
ods to problems of moderate to high dimensionality, which is comfortably beyond current 
ABC practice. We believe that regression approaches in ABC are likely to undergo further 
active development in the near future, as interest in ABC for more complex and higher 
dimensional models increases. 
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